#######################################
# This file describes the datasets and provides a guide for the 
# replication materials for the paper:
# "Justice as Checks and Balances: Indigenous claims in the courts of colonial Mexico"
# Author: Edgar Franco-Vivanco
# July 2021
########################################

#-------------------------------------#
##### Datasets
#-------------------------------------#
## The replication files rely on the following datasets:

- agn_indios_2021.csv: Dataset using individual claims as unit of analysis. 
                   These documents are part of the greater AGN catalog, specifically
                   this dataset contains the ones associated to indigenous claims. 

- rebel_2021.csv: Panel data at province-year level of small-scale violent events 
                  (indigenous riots, protests, etc.).

- near3.dbf: Distance in km of each of the colonial provinces to Mexico City.

- gerhard_centroid.dbf: Distance in km of the colonial provinces to Mexico City 
                        with province identifiers.

- kings_viceroys_new_spain.csv: Table includuing dummy variables by year for each Viceroy,
                             King, and war in which Spain was involved

- pob_decades.csv: Total indigenous population and proportion under encomienda by decade 
			for each colonial province

- acatlan_piastla.csv: Proportion of indigenous population under ecomienda by decade

### Human coding
- classification_matrix_2021.csv: Sample of ~3000 documents manually classfied

### LBSF
- lbsf_2021.csv: Dataset for the ~2000 cases classified by Lesley B. Simpson

### Panel Data
- agn_panel_2021.csv: Panel data at decade-province level

- high_merit_panel.csv: Panel of the high merit cases by province

### AGN dataset
- agn_s.csv: Full 619000 colonial documents stored in the AGN

### Maps of colonial provinces (available upon request)
- Mex_terr_2016.shp

### Indigenous population
- indian_pop_estimated.csv: estimation of indigenous and non-indigenous population using secondary sources.

### Detailed data about the specific variables can be found in:
- codebooks_Justice_EFV.xlsx: Color scheme:
  - red: main dataset
  - yellow: helper datasets for the main analysis
  - blue: lbsf
  - green: human coding
  - purple: panel data
  - gray: entire AGN catalog

#-------------------------------------#
##### R files
#-------------------------------------#
### To run these files make sure that the path corresponds to your own directory

## The replication files rely on the following R scripts:

- main_results_replication_2021.R: This script generates the main results in the document
  using the files as units of analysis. 

- panel_results_replication_2021.R: This script works with panel data at province-decade level

- human_coding_replication_2021.R: This script uses the human coded sample of 3000 cases 

- lbsf_replication_2021.R: This script works with the ~2000 cases coded by Lesley B. Simpson which
  contain additional metadata, particularly the defendant in each case. 

- plot_indigenous_pop_2021.R: Script used to create Figure S1.1

- geography_replication_2021.R: This script creates maps for the supplementary material

- agn_analysis.R: Creates the analysis for S11 in supplementary materials.

#-------------------------------------#
##### Main Text
#-------------------------------------#
# Note: The lines specified in each script are approximations

### FIGURES
### Figure 1. Proportion of Documents by Type of Ruling (1590-1820)
Using the agn_indios file, the code in main_results_replication_2021.R (lines 92 to 128) creates a stacked plot
of positive and negative cases  

### Figure 2. Predicted Probability of Favorable Ruling Conditioned on Topic of Claim and Local Balance of Powers
These figures display the predicted probabilities uisng the results from Table 4. Note that to create thes figures I relied on
the Zelig package which is not compatible with new R versions. There are some instructions in the code to go around this issue.
Use main_results_replication_2021.R (lines 273 to 450).

### Figure 3. Estimates of Favorable Ruling by Perpetrators and Type of Claim
This figures shows the coefficients associated with different defendants by type of claim. The figure is created with the LBSF dataset.
Use lbsf_replication_2021.R (lines 64 to 212)

### Figure 4. Determinants of Total Claims 
Figure displaying the coefficients for the panel analysis at province-decade level appearing in Table A4.1. 
Use panel_results_replication_2021.R (lines 49 to 86)


#### TABLES
### Table 4. Favorable Ruling, Topics, and Balance of Powers
This table displays the main results of the paper, that is, probability of a favorable ruling conditioned
on topics and balance of powers. Created with code in main_results_replication_2021.R (lines 155 to 272)


#-------------------------------------#
##### Appendix
#-------------------------------------#

### FIGURES
### Figure A2.1: First-difference effects under diverse scenarios
This figure displays the first-difference effects under distinct scenarios of population balance of powers
To create this figure call the Zelig library. Created with code in main_results_replication_2021.R (lines 664 to 749)

#### TABLES
#### Table A2.1: Favorable Ruling, Topics, and Balance of Powers (interactive models)
Table presenting the full interactions between topocs, population declie and local elite strengh from model A1
Created with code in main_results_replication_2021.R (lines 455 to 526)


### Table A2.2 Simple regression effects and Wald test for linear equality
This table collects the simple regression effects for the triple interaction model
Created with code in main_results_replication_2021.R (lines 530 to 659)

#### Tables A2.3 and A2.4. Favorable Ruling, Topics, and Actors
Results of the model using the LBSF to test for the effect of the defendat. Table A4 include province FE
Use lbsf_replication_2021.R (lines 215 to 280)

#### Table A3.1 and A3.2. Fovarable Ruling and Topics of Claims (Human coding)
These tables show the results for the anaysis using the ~3000 cases manually coded according to the 
rules described in Appendix S4
Use human_coding_replication_2021.R (lines 30 to 260)

#### Table A4.1. Determiants of the Number of Claims
Results using panel data at province-decade level to test the effect of province caracteristics on number of claims.
Use panel_results_replication_2021.R (lines 49 to 113)

#### Table A4.2. Distribution of documents by number of pages
Distrubution of document length. This measure is used to classify high merit/complexity cases
Created with code in main_results_replication_2021.R (lines 765  to 786)

#### Table A4.3. Determinants of the number of claims (High merit cases)
Results using panel data at province-decade level to test the effect of province caracteristics on number of claims for
high merits cases
Use panel_results_replication_2021.R (lines 88 to 131)

#### Table A4.4. High merit cases and local balance of powers
Analysis using high merit as a DV to test for bias. 
Created with code in main_results_replication_2021.R (lines 784  to 833)


#### Table A4.5. Favorable ruling, topics, and balance of powers (high merit/complexity cases)
Analysis using high complexity cases only
Created with code in main_results_replication_2021.R (lines 837  to 960)


#-------------------------------------#
##### Supplementary Materials 
#-------------------------------------#

### FIGURES
### Figure S1.1. The Decline and Recovery of the Native Population
Use plot_indigenous_pop_2021.R

### Figure S1.2. Indigenous Claims in the GIC, 1592-1820
This plot display a smoothed line of the total number of cases by year
Created with code in main_results_replication_2021.R (lines 1021 to 1029)

### Figure S1.3. Convergence Effects Post-1700
Figure showing the convergence effects across time for favorable ruling by topic
Created with code in main_results_replication_2021.R (lines 1135 to 1215)

### Figure S3.2. Geolocated Claims of the LBSF

### Figure S3.3. Monthly Trend of Claims (1589-1688)
Figure showing the monthly distribution of claims in the LBSF
Use lbsf_replication_2021.R (lines 290 to 302)

### Figure S3.4. Distribution of the Number of Claims by Individual Town (1589-1688)
Histogram of number of claims by towm
Use lbsf_replication_2021.R (lines 328 to 336)

### Figure S3.5. Comparison Between LBSF Sample and Full Sample by Year
These two plots show the temporal distribution of files in the LBSF and in the entire dataset
Use lbsf_replication_2021.R (lines 341 to 359)

### Figure S3.6. Most Frequent Words in Claims Against Different Perpetrators
Most common words in the text for each claim by perpetrator
Use lbsf_replication_2021.R (lines 454 to 555)

### Figure S5.1. Provinces and Regions of Colonial Mexico
Use geography_replication.R (lines (36 to 71)

### Figure S5.2. Indigenous Claims by Province
Map of claim intensity by province 
Use geography_replication.R (lines 76 to 105)

### Figure S7.1. Proportion of Indigenous Population Living Under Encomienda System in Acatlan y Piastla 
Evolution of encomienda system in acatlan and piastla 
Use code in main_results_replication_2021.R (lines 1670 to 1682)

### Figure S9.1. Claims by Topic Across Time 
Figure showing the evolution of topic prevalence from 1592 to 1821.
Use code in main_results_replication_2021.R (lines 1338 to 1366)

### Figure S9.2 Favorable Ruling for Provinces with High and Low Court Intensity Usage
Figure showing rate of success for provinces with high and low court intensity usage
Use code in main_results_replication_2021.R (lines 1489 to 1525)

### Figure S11.1 Total Files in the Colonial Archives by Year
Temporal distribution of colonial documents in the AGN
Use agn_analysis.R (lines 16 to 22)

#### TABLES #######################
### Table S1.1. Favorable Ruling, Topics, and Vulnerability after Bourbon reform
Results of determinants of favorable ruling comparing before and after Bourbon reform
Use code in main_results_replication_2021.R (lines 1033 to 1132)

### Table S1.2. Favorable Ruling and Political Factors
Regresion including FE for Viceroys, Kings and external wars 
Use code in main_results_replication_2021.R (lines 1217 to 1275)

### Table S1.3 Favorable Ruling, Topics, and Balance of Powers (Direct vs. Indirect Rule)
This table shows the results for provinces tha never experienced encomienda (direct rule) vs those
which were under this system
Use code in main_results_replication_2021.R (lines 1276 to 1333)

### Table S2.1. Favorable Ruling and Population Change
This table show the results for the effects of favorable ruling on population change.
The model uses panel data at province-decade level. 
Use panel_results_replication_2021.R (lines 134 to 207)

### Table S3.2. Complainants and Defendants in LBSF
This table shows the distribution by actor in the LBSF
Use lbsf_replication_2021.R (lines 290 to 302)

### Table S3.3. Provinces with Large Numbers of Claims
12 provinces with the larges number of claims in the LBSF
Use lbsf_replication_2021.R (lines 305 to 311)

### Table S3.4 Proportion of Cases by Type in the LBSF Sample and the Full Sample
Table showing proportion of cases by topic in both datasets
Use lbsf_replication_2021.R (lines 366 to 392)

### Table S3.5 Distribution of Cases by Actors/Perpetrators
Thius table shows the proportion of cases by topic and actor in the LBSF
Use lbsf_replication_2021.R (lines 393 to 410)

### Table S3.6 Perpetrators and Topics of Claims
Table showing the coefficients of each actor using topics as regression variable
Use lbsf_replication_2021.R (lines 412 to 450)

### Table S4.1. Fleis Kappa
This table presents the ICR results using Fleiss Kappa as a measure.
Use human_coding_replication_2021.R (lines 266 to 430)

### Table S5.1. Favorable Ruling, Topics, and Balance of Power (central provinces)
Use geography_replication.R (lines 108 to 246)

### Table S5.2. Favorable Ruling, Topics, and Balance of Powers (Northern Frontier)
Use geography_replication.R (lines 250 to 287)

### Table S6.2. Uprising and Successsfull Claims 
This table shows the relationship between rebellions at province level and succesful claims 
Use panel_results_replication_2021.R (lines 211 to 207)

### Table S6.3 .Granger Test Results 
Granger test between rebellions and claims
Use panel_results_replication_2021.R (lines 270 to 289)

### Table S8.2. Rates of protection based on merits and province type (LE)
Rates of protection under weak and strong LE
Created with code in main_results_replication_2021.R (lines 934 to 1011)

### Table S8.3. Rates of protection based on merits and province type (Population)
Rates of protection under population decline or increase
Created with code in main_results_replication_2021.R (lines 934 to 1011)

### Table S9.1. Topics of the Case and Prior Success
These results show the effect of balance of powers and prior rate of success on topic prevalence
Use human_coding_replication_2021.R (lines 1370 to 1485)

### Table S10.1. Single-topic Cases and Concurrences 
Table showing cases with a single topic 
Created with code in main_results_replication_2021.R (lines 1527 to 1555)

### Table S10.2. Favorable Ruling, Topics, and Balance of Power (Single-topic cases)
Table showing the main results using only cases with a sinbgle topic
Created with code in main_results_replication_2021.R (lines 1557 to 1666)

### Table S11.1 Top Ten Categories in the Colonial Archives
Distribution of colonial documents in the AGN by topic
Use agn_analysis.R (lines 9 to 13)
